There are three categoris of built-in collection types in Python:
array
and collections
provide additional collection typeskeys
set
is an unordered collection of items that contains no duplicates. (Mutable)frozenset
is an immutable version of set.set
.set('TCAGTTAT')
DNABases = {'T', 'C', 'A', 'G'}
RNABases = {'U', 'C', 'A', 'G'}
DNABases
RNABases
{'TCAG'}
{'TCAG', 'UCAG'}
{'AATTGC'}
set('AATTGC')
# Rewrite validate_base_sequence using sets
DNAbases = set('TCAGtcag')
RNAbases = set('UCAGucag')
def validate_base_sequence(base_sequence, RNAflag = False):
"""Return True if the string base_sequence contains only upper- or lowercase
T (or U, if RNAflag), C, A, and G characters, otherwise False"""
return set(base_sequence) <= (RNAbases if RNAflag else DNAbases)
validate_base_sequence('tattattat',True)
set('tattattat')
RNAbases
validate_base_sequence('atgcwrqatgc')
ranges
and files
support the methods count
and index
. reversed
function, which returns a special object that produces the elements of the sequence in reverse order.str()
- returns empty stringstr(obj)
- Returns a printable representation of obj , as specified by the definition of the type of objchr(n)
- Returns the one-character string corresponding to the integer n in the Unicode systemord(char)
- Returns the Unicode number corresponding to the one-character string charstr()
str(4)
chr(105)
ord('D')
str1.isalpha()
- Returns true if str1 is not empty and all of its characters are alphabeticstr1.isdigit()
- Returns true if str1 is not empty and all of its characters are digitsstr1.islower()
- Returns true if str1 contains at least one "cased" character and all of its cased characters are lowercase str1.isupper()
- Returns true if str1 contains at least one "cased" character and all of its cased characters are uppercasestr1.startswith(str2[, startpos, [endpos]])
- Returns true if str1 starts with str2str1.endswith(str2[, startpos, [endos]])
- Returns true if str1 ends with str2str1.find(str2[, startpos[, endpos]])
- Returns the lowest index of str1 at which str2 is found, or -1 if it is not foundstr1.index(str2[, startpos[, endpos]])
- Returns the lowest index of str1 at which str2 is found, or ValueError if it is not foundstr1.count(str2[, startpos[, endpos]])
- Returns the number of occurrences of str2 in str1str1.replace(oldstr, newstr[, count])
- Returns a copy of str1 with all occurrences of the substring oldstr replaced by the string newstr ; if count is specified, only the first count occurrences are replaced.str1.lower()
- Returns a copy of the string with all of its characters converted to lowercasestr1.upper()
- Returns a copy of the string with all of its characters converted to uppercasestr1.capitalize()
- Returns a copy of the string with only its first character capitalized; has no effect if the first character is not a letter (e.g., if it is a space)str1.title()
- Returns a copy of the string with each word beginning with an uppercase character and the rest lowercasestr1.swapcase()
- Returns a copy of the string with lowercase characters made uppercase and vice versastr1.lstrip([chars])
- Returns a copy of str1 with leading characters removed.str1.rstrip([chars])
- Returns a copy of str1 with trailing characters removed.str1.strip([chars])
- Returns a copy of str1 with leading and trailing characters removed.str1.ljust(width[, fillchar])
- Returns str1 left-justified in a new string of length width , "padded" with fillchar (the default fill character is a space).str1.rjust(width[, fillchar])
- Returns str1 right-justified in a new string of length width , "padded" with fillchar (the default fill character is a space).str1.center(width[, fillchar])
- Returns str1 centered in a new string of length width , "padded" with fillchar (the default fill character is a space).format(value[, format-specification])
- Returns a string obtained by formatting value according to the formatspecification ; if no format-specification is provided, this is equivalent to str(value)
.format-specification.format(posargs, ..., kwdargs, ...)
- Returns a string formatted according to the format-specification; any number of positional arguments may be followed by any number of keyword argumentsstr1 = 'a string'
'"{0}" contains {1} characters'.format(str1, len(str1))
'"{}" contains {} characters'.format(str1, len(str1))
'"{string}" contains {length} characters'.format(string=str1, length=len(str1))
'"{string}" contains {length} characters'.format(length=len(str1),string=str1)
A range
represents a series of integers.
range(stop)
- creates a range representing the integers from 0 up to but not including stop .range(start, stop)
- creates a range representing the integers from start up to but not including stop.range(start, stop,step)
- creates a range representing the integers from start up to but not including stop , in increments of step .range(5)
set(range(5))
set(range(5, 10))
set(range(5, 10, 2))
set(range(15, 10, -2))
set(range(0, -25, -5))
('TCAG', 'UCAG') # 2 element tuple
('TCAG',) # 1 element tuple
() # empty tuple
('TCAG') # Not a tuple!
tuple('TCAG')
tuple(range(5,10))
bases = 'TCAG', 'UCAG'
bases
DNABases, RNABases = 'TCAG', 'UCAG'
DNABases
RNABases
def recognition_site(base_seq, recognition_seq):
return base_seq.find(recognition_seq)
def restriction_cut(base_seq, recognition_seq, offset = 0):
"""Return a pair of sequences derived from base_seq by splitting it at the first appearance
of recognition_seq; offset, which may be negative, is the number of bases relative to the
beginning of the site where the sequence is cut"""
site = recognition_site(base_seq, recognition_seq)
return base_seq[:site+offset], base_seq[site+offset:]
aseq1 = 'AAAAATCCCGAGGCGGCTATATAGGGCTCCGGAGGCGTAATATAAAA'
left, right = restriction_cut(aseq1, 'TCCGGA')
left
right
a, b = 4, 2
a
b
a, b = b, a
a
b
del lst[n]
- remove the nth element from lstdel lst[i:j]
- remove the ith through jth elements from lstdel lst[i:j:k]
- remove every k elements from i up to j from lstlist1 = [1,2,3]
list2 = [4,5]
list1 + list2 # Concatenation, list1 & list2 remain unchanged
list1.extend(list2)
list1
list2
string.splitlines([keepflg])
- Returns a list of the “lines” in string , splitting at end-of-line characters. If keepflg is omitted or is false, the end-of-line characters are not included in the lines; otherwise, they are.string.split([sepr[, maxwords]])
- Returns a list of the “words” in string , using sepr as a word delineator. In the special case where sepr is omitted or is None , words are delineated by any consecutive whitespace characters; if maxwords is specified the result will have at most maxwords +1 elements.string.rsplit([sepr[, maxwords]])
- Performs a reverse split: same as split except that if maxwords is specified and its value is less than the number of words in string the result returned is a list containing the last maxwords+1 words.sepr.join(seq)
- Returns a string formed by concatenating the strings in seq separated by sepr, which can be any string (including the empty string).string.partition(sepr)
- Returns a tuple with three elements: the portion of string up to the first occurrence of sepr , sepr , and the portion of string after the first occurrence of sepr . If sepr is not found in string , the tuple is (string, '', '').string.rpartition(sepr)
- Returns a tuple with three elements: the portion of string up to the last occurrence of sepr , sepr , and the portion of string after the last occurrence of sepr . If sepr is not found in string , the tuple is ('', '', string)a = [ 'a', 'b', 'c', 'd', 'e']
abc = ','.join(a)
abc
abc.split()
abc.split(',')